44 research outputs found
Kekulescope: Prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images
The application of convolutional neural networks (ConvNets) to harness
high-content screening images or 2D compound representations is gaining
increasing attention in drug discovery. However, existing applications often
require large data sets for training, or sophisticated pretraining schemes.
Here, we show using 33 IC50 data sets from ChEMBL 23 that the in vitro activity
of compounds on cancer cell lines and protein targets can be accurately
predicted on a continuous scale from their Kekule structure representations
alone by extending existing architectures, which were pretrained on unrelated
image data sets. We show that the predictive power of the generated models is
comparable to that of Random Forest (RF) models and fully-connected Deep Neural
Networks trained on circular (Morgan) fingerprints. Notably, including
additional fully-connected layers further increases the predictive power of the
ConvNets by up to 10%. Analysis of the predictions generated by RF models and
ConvNets shows that by simply averaging the output of the RF models and
ConvNets we obtain significantly lower errors in prediction for multiple data
sets, although the effect size is small, than those obtained with either model
alone, indicating that the features extracted by the convolutional layers of
the ConvNets provide complementary predictive signal to Morgan fingerprints.
Lastly, we show that multi-task ConvNets trained on compound images permit to
model COX isoform selectivity on a continuous scale with errors in prediction
comparable to the uncertainty of the data. Overall, in this work we present a
set of ConvNet architectures for the prediction of compound activity from their
Kekule structure representations with state-of-the-art performance, that
require no generation of compound descriptors or use of sophisticated image
processing techniques
Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel
Medicinal Chemistr
Recommended from our members
QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping.
Funder: FP7 People: Marie-Curie Actions; doi: http://dx.doi.org/10.13039/100011264; Grant(s): 238701, 238701An affinity fingerprint is the vector consisting of compound's affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds
Recommended from our members
QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping
Funder: FP7 People: Marie-Curie Actions; doi: http://dx.doi.org/10.13039/100011264; Grant(s): 238701, 238701Abstract: An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds
Recommended from our members
QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping
Funder: FP7 People: Marie-Curie Actions; doi: http://dx.doi.org/10.13039/100011264; Grant(s): 238701, 238701Abstract: An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds
Author Correction: Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing (Nature Genetics, (2020), 52, 3, (331-341), 10.1038/s41588-019-0576-7)
Correction to: Nature Genetics, published online 05 February 2020. In the published version of this paper, the members of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium were listed in the Supplementary Information; however, these members should have been included in the main paper. The original Article has been corrected to include the members and affiliations of the PCAWG Consortium in the main paper; the corrections have been made to the HTML version of the Article but not the PDF version. Additional corrections to affiliations have been made to the PDF and HTML versions of the original Article for consistency of information between the PCAWG list and the main paper
Author Correction: Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition
Correction to: Nature Genetics https://doi.org/10.1038/s41588-019-0562-0, published online 05 February 2020
Author Correction: Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer
Correction to: Nature Genetics https://doi.org/10.1038/s41588-019-0564-y, published online 05 February 2020
Pan-cancer analysis of whole genomes
Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale(1-3). Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter(4); identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation(5,6); analyses timings and patterns of tumour evolution(7); describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity(8,9); and evaluates a range of more-specialized features of cancer genomes(8,10-18).Peer reviewe